Search CORE

264 research outputs found

A Uniform Dependency Language for Improving Data Quality

Author: Fan Wenfei
Geerts Floris
Publication venue
Publication date: 01/01/2011
Field of study

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

Extending Dependencies with Conditions

Author: Bravo Loreto
Fan Wenfei
Ma Shuai
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

A revival of integrity constraints for data cleaning

Author: Fan Wenfei
Geerts Floris
Jia Xibei
Publication venue
Publication date: 01/01/2008
Field of study

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

Semandaq: a data quality system based on conditional functional dependencies

Author: Fan Wenfei
Geerts Floris
Jia Xibei
Publication venue
Publication date: 01/01/2008
Field of study

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

Performance Guarantees for Distributed Reachability Queries

Author: Fan Wenfei
Wang Xin
Wu Yinghui
Publication venue
Publication date: 01/01/2012
Field of study

In the real world a graph is often fragmented and distributed across different sites. This highlights the need for evaluating queries on distributed graphs. This paper proposes distributed evaluation algorithms for three classes of queries: reachability for determining whether one node can reach another, bounded reachability for deciding whether there exists a path of a bounded length between a pair of nodes, and regular reachability for checking whether there exists a path connecting two nodes such that the node labels on the path form a string in a given regular expression. We develop these algorithms based on partial evaluation, to explore parallel computation. When evaluating a query Q on a distributed graph G, we show that these algorithms possess the following performance guarantees, no matter how G is fragmented and distributed: (1) each site is visited only once; (2) the total network traffic is determined by the size of Q and the fragmentation of G, independent of the size of G; and (3) the response time is decided by the largest fragment of G rather than the entire G. In addition, we show that these algorithms can be readily implemented in the MapReduce framework. Using synthetic and real-life data, we experimentally verify that these algorithms are scalable on large graphs, regardless of how the graphs are distributed.Comment: VLDB201

arXiv.org e-Print Archive

Edinburgh Research Explorer

Diversified Top-k Graph Pattern Matching

Author: Fan Wenfei
Wang Xin
Wu Yinghui
Publication venue
Publication date: 01/01/2013
Field of study

Edinburgh Research Explorer

Making Queries Tractable on Big Data with Preprocessing

Author: Fan Wenfei
Geerts Floris
Neven Frank
Publication venue
Publication date: 01/01/2013
Field of study

A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME al-gorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to pro-vide a formal foundation for this approach in terms of com-putational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠT0Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natu-ral query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be ef-fectively converted to Π-tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠT0Q ⊂ P unless P = NC, i.e., the set ΠT0Q of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper re-factorizations. This work is a step towards understanding the tractability of queries in the context of big data. 1

CiteSeerX

Edinburgh Research Explorer

Reasoning about Record Matching Rules

Author: Fan Wenfei
Jia Xibei
Li Jianzhong
Ma Shuai
Publication venue
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

Putting Context into Schema Matching

Author: Bohannon Philip
Elnahrawy Eiman
Fan Wenfei
Flaster Michael
Publication venue
Publication date: 01/01/2006
Field of study

Edinburgh Research Explorer

Constraints for Semistructured Data and XML

Author: Buneman Peter
Fan Wenfei
Siméon Jérôme
Weinstein Scott
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

Integrity constraints play a fundamental role in database design. We review initial work on the expression of integrity constraints for semistructured data and XML

CiteSeerX

Crossref

Edinburgh Research Explorer